QSAR-driven rational design of novel DNA methyltransferase 1 inhibitors

DNA methylation, an epigenetic modification, is mediated by DNA methyltransferases (DNMTs), a family of enzymes. Inhibitions of these enzymes are considered a promising strategy for the treatment of several diseases. In this study, a quantitative structure-activity relationship (QSAR) modeling was employed to understand the structure-activity relationship (SAR) of currently available non-nucleoside DNMT1 inhibitors (i.e., indole and oxazoline/1,2-oxazole scaffolds). Two QSAR models were successfully constructed using multiple linear regression (MLR) and provided good predictive performance (R2Tr = 0.850-0.988 and R2CV = 0.672-0.869). Bond information content index (BIC1) and electronegativity (R6e+) are the most influential descriptors governing the activity of compounds. The constructed QSAR models were further applied for guiding a rational design of novel inhibitors. A novel set of 153 structurally modified compounds were designed in silico according to the important descriptors deduced from the QSAR finding, and their DNMT1 inhibitory activities were predicted. This result demonstrated that 86 newly designed inhibitors were predicted to elicit enhanced DNMT1 inhibitory activity when compared to their parent compounds. Finally, a set of promising compounds as potent DNMT1 inhibitors were highlighted to be further developed. The key SAR findings may also be beneficial for structural optimization to improve properties of the known inhibitors.


INTRODUCTION
Epigenetics is an alteration of gene expression without changing the genomic structure. Epigenetics machinery includes DNA methylation, histone modification (e.g., acetylation and methylation) and non-coding RNAs (Handy et al., 2011). Epigenetic regulation can be altered by exogenous factors such as diet and exposing environment (Kanherkar et al., 2014). Thus, epigenetic alteration serves as dynamic flexible responses which can be reversibly modified throughout the lifetime (Schuebel et al., 2016). Epigenetic regulation has been recognized to play an important role in driving cells toward normal cellular phenotypes and functions. An alteration of epigenetic regulation also has been noted in pathogenesis of many diseases including cardiovascular diseases (Abi Khalil, 2014), neurological diseases (Landgrave-Gómez et al., 2015), metabolic disorders (Kuneš et al., 2015) and cancers (Verma, 2013). Among all, DNA methylation is considered to be one of the most common modifications found in many diseases (Jin and Liu, 2018).
DNA methylation is a process by which a methyl group (-CH3) is transferred from Sadenosyl-L-methionine (SAM) to the C-5 position of cytosine residue of CpG islands, which are regions of large repetitive CpG dinucleotides (Wang and Leung, 2004). This reaction requires a key catalytic enzyme namely DNA methyltransferases (DNMTs). DNMTs are categorized into 5 types i.e., DNMT1, DNMT2, DNMT3A, DNMT3B and DNMT3L (Zhang and Xu, 2017). However, DNMT1 is considered to be the most stable epigenetic mark and is abundantly found in human cells (Hermann et al., 2004). DNMT1 acts as a maintenance methyltransferase which functions to control level of gene expressions by inhibiting gene transcription leading to gene silencing (Heerboth et al., 2014). An aberration of DNMT1 function (either hyper-or hypo-methylation) has been observed for many diseases. For example, an alteration of DNMT1 leads to an inactivation of several key genes in cancer cells (Esteller, 2008;Lee et al., 2010;Xu et al., 2011). Excessive DNMT1 activity and hypermethylation are also found in many neurological disorders (Wüllner et al., 2016;Yokoyama et al., 2017). Notably, these diseases are multifactorial disorders in which exogenous factors play crucial roles. Along with its reversible nature, a modification of DNMT1 activity serves as an attractive treatment strategy toward these diseases.
Recently, several DNMT inhibitors (DNMTi) have been reported to successfully treat many diseases (e.g., breast cancer (Gupta et al., 2019;Luo et al., 2018), pancreatic carcinoma , Huntington's disease (Pan et al., 2016), acute myeloid leukemia (Benetatos and Vartholomatos, 2016), myelodysplastic syndrome (Stresemann et al., 2008), sickle cell anemia (Fathallah and Atweh, 2006;Saunthararajah et al., 2003) and βthalassemia (Ley et al., 1982)). In addition, a demethylation by DNMTi displayed preferable clinical outcome against chemoresistance cancer cells which are not responsive to standard chemotherapy (Clozel et al., 2013). To date, two inhibitors (e.g., 5-azacitidine and decitabine) have been approved by the U.S. Food and Drug Administration (FDA) and European Medicines Agency for treatment of acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS) (European Medicines Agency, 2009;Nieto et al., 2016;Saba, 2007). Despite their high efficiency, these nucleoside inhibitors are unstable compounds with poor bioavailability and cytotoxicity (Erdmann et al., 2015). The cytotoxicity has been noted to be derived from their mechanisms of action which incorporates into DNA and RNA of the cells. Therefore, there is a growing interest of developing alternative non-nucleoside inhibitors to avoid these limitations. Some non-nucleoside inhibitors are along the way of development, however, their potencies are still lower than those of nucleoside inhibitors and none of them have been approved for clinical uses (Chuang et al., 2005;Valente et al., 2014;Zhong et al., 2016). Recently, quinoline scaffold has been reported to exhibit DNMT1 inhibitory activity (Zwergel et al., 2020).
Computational approaches have been recognized for their facilitating roles in drug development process (Prachayasittikul et al., 2015). Quantitative structure-activity relationship (QSAR) is a method to find a relationship between chemical structures of compounds and their biological activities and is one of the most commonly used approaches to increase success rate and reduce time of drug development (Nantasenamat et al., 2009). QSAR modeling reveals a set of key chemical features and physicochemical properties that are essential for potent activity which would be beneficial for guiding the structural design and optimization to obtain potential compounds with preferable properties (Prachayasittikul et al., 2015(Prachayasittikul et al., , 2017Pratiwi et al., 2019;Worachartcheewan et al., 2020).
In this study, QSAR modeling was performed to reveal structure-activity relationship (SAR) of indole-based (scaffold A) and oxazoline/1,2-oxazole-based (scaffold B) DNMT1 inhibitors (Figure 1). Multiple linear regression (MLR) algorithm was used for model construction to allow effective SAR analysis. To expand structural diversity, an additional set of 153 structurally modified compounds were rationally designed according to key descriptors obtained from QSAR findings and their activities were predicted. SAR analysis was performed to gain insights toward essential key features required for potent activity. Additionally, chemical space plots were generated to illustrate drug-likeness of the studied compounds. Finally, a set of promising novel DNMT1 inhibitors were highlighted to be further developed. SAR findings also would be useful for screening, guiding design and structural optimization of the related compounds for DNMT1 inhibition.

MATERIALS AND METHODS
A schematic summary of QSAR modeling process is presented in Figure 2.

Data collection
A set of bioactive compounds with DNMT1 inhibitory activity were collected from ChEMBL25 database (EMBL-EBI, 2019). The datasets have been thoroughly curated according to the established protocol (Fourches et al., 2010). The main steps of data curation are as followed (i) removal of inorganics, salts and mixtures, (ii) structural validation and cleaning, (iii) normalization of specific chemotypes, (iv) deletion of duplicates and (v) final checking. As a result, a final data set of DNMT1 inhibitors, comprising chemical structures of 15 inhibitors (in SMILES format) and their bioactivity (IC50 values), was primarily compiled from 5 original articles (Asgatay et al., 2014;Castellano et al., 2011;Erdmann et al., 2015;Siedlecki et al., 2006;Valente et al., 2014). Afterwards, these compounds were manually grouped according to their core structures into 2 groups i.e., scaffold A (indole derivatives) and scaffold B (oxazoline/1,2-oxazole derivatives), to obtain data sets consisted of 8 and 7 compounds belonging to scaffolds A (1a-8a) and B (1b-7b), respectively ( Figure 1). Bioactivities of DNMT1 inhibitors (IC50 values) were converted to pIC50 values by taking the negative logarithm based 10.

Geometry optimization
Chemical structures in SMILES format were converted into MOL format using molconvert (ChemAxon, 2018). All compounds were geometrically optimized using Gaussian 09 software (Frisch et al., 2009) to obtain low energy conformation by density functional theory (DFT) computation using Becke's three-parameter Lee-Yang-Parr hybrid functional (B3LYP) in concomitant with the LanL2DZ basis set.

Molecular descriptors
Molecular descriptors are a set of numerical values representing the molecules in terms of their connectivity, constitution and physicochemical properties (Nantasenamat et al., 2010). Two types of descriptors (i.e., quantum chemical descriptors and dragon descriptors) were used for QSAR modeling due to their interpretable nature.

Feature selection
To reduce overfitting and improve accuracy of prediction, correlation-based feature selection was employed for selecting a set of informative descriptors. Initially, Pearson's correlation coefficient (r) values were calculated for each pair of descriptor and bioactivity (pIC50). A cutoff value of 0.8 was used to select an initial set of correlated descriptors (with |r| ≥ 0.8) for further selection with multiple linear regression (MLR) method using SPSS (IBM Corp., 2011).
As a result, final sets of informative descriptors were obtained for further QSAR modeling.

QSAR model construction
MLR is one of the commonly used machine learning algorithms to reveal a linear relationship between a set of independent variables (i.e., molecular descriptors; Xn) and the dependent variable of interest (i.e., DNMT1 inhibitory activity; Y). In this study, two QSAR models were separately constructed according to their distinct scaffolds (i.e., scaffolds A and B). For each input data set, a set of selected descriptor values of the compounds along with their bioactivities (pIC50 value) were provided to train the machine. MLR models were constructed using Weka software (Hall et al., 2008) as shown in Equation 1: where Y is the pIC50 values of compounds, B0 is the intercept and Bn are the regression coefficient of descriptors Xn.

Model validation
The data set was divided into training set and testing set by leave-one-out cross validation (LOO-CV) (Roy et al., 2015). N is the number of samples in the data set. One sample was removed from the whole data set to be predicted, whereas the remaining samples (N-1) were used as the training set. The same sampling process was continued until every sample was leaved out to be predicted as Y variable (activity).

Evaluation of the predictive performance of QSAR model
Two statistical parameters such as correlation coefficient (R) and root mean square error (RMSE) values were calculated to assess the predictive performance of the constructed QSAR models (Prachayasittikul et al., 2017;Pratiwi et al., 2019).

Prediction of modified compounds
A set of 153 structurally modified compounds were rationally designed according to key descriptors of the constructed QSAR models. All newly designed compounds were drawn, geometrically optimized and calculated, in the same manner as the original compounds, to obtain a set of key descriptor values. Then, these key descriptor values were replaced in the QSAR equation (as X variables) to calculate predicted pIC50 values of the modified compounds.

QSAR models
A set of bioactive compounds with DNMT1 inhibitory activity were collected from ChEMBL25 database (EMBL-EBI, 2019) and were preprocessed according to the established protocol (Fourches et al., 2010). A set of curated compounds were divided into 2 groups (i.e., scaffolds A and B) according to their core structures ( Figure 1). All compounds were optimized and calculated to obtain their descriptor values (as a set of structural representatives). Correlation-based feature selection followed by MLR method were performed to obtain a final set of 6 informative descriptors. Definitions of selected descriptors (Table 1) and descriptor values ( Supplementary Tables 1-2) of the investigated compounds are provided. Values of selected descriptor together with the bioactivity (pIC50 values) were used as input data sets to construct the QSAR models using MLR algorithm. Herein, two QSAR models were separately constructed based on core structure of the compounds (i.e., scaffold A and scaffold B).
For scaffold A, two informative descriptors (i.e., ) were used to construct QSAR model (Equation 2). An influence of each descriptor on pIC50 value was demonstrated by its regression coefficient value. The QSAR model revealed that bond information content (BIC1 with regression coefficient = 3.9879) is the most influential descriptor for predictive DNMT1 inhibitory activity of indoles. Four selected descriptors were used to build the QSAR model of scaffold B (Equation 3) including electronegativity (R8e and R6e+), van der Waals volume (RDF045v) and topological distance (B09[N-N]) descriptors. R6e+ and B09[N-N]) descriptors had positive effects on the activity of oxazoline and 1,2oxazole inhibitors as shown by positive regression coefficient values, whereas negative effects were observed for those with negative regression coefficient (i.e., R8e and RDF04v). The R6e+ was shown to be the most influential descriptor with regression coefficient value of 10.1847. In overview, the constructed QSAR models provided acceptable predictive performance, as shown by high R 2 (0.672-0.988) but low RMSE (0.041-0.224) values. The calculated parameters representing model's performance are summarized in Table 2. Good predictive performance of the models was observed with low difference between experimental and predicted activities of scaffolds A and B (Table 3). Comparative plots of the experimental and predicted pIC50 values of the scaffolds A and B are shown in Figure 3.

Application of QSAR models for the rational design and prediction of novel DNMT1 inhibitors
The constructed QSAR models were further applied for the rational design of a novel series of 153 structurally modified compounds with relevant scaffolds. The important descriptors presented in the model were used as a guide for structural modification strategy. Finally, 153 derivatives of scaffolds A (80 modified compounds) and B (73 modified compounds) were virtually designed (Supplementary Figures 1-2), in which their descriptor values were calculated and subsequently applied to the QSAR equations for predicting their activities (Supplementary Tables 3-4). As a result, a series of modified compounds with improved activity (when compared to their parent compounds) are summarized in Figures 4 and 5. The promising novel compounds with the most potent predicted activities are highlighted such as compounds 3a11 and 2b8 ( Figure 6).

Understanding structure-activity relationship (SAR)
In-depth SAR analysis was performed to consider the important chemical features governing bioactivity of the original scaffolds A and B as well as the modified compounds.

Scaffold A
Scaffold A is a series of indole-amino compounds (Figures 1 and 7) which contain ring A (indole and its analogs) substituted by 2-aminocarboxylic acid side chain (1a-5a and 8a) and aza-indole ring A (6a and 7a). Bioactivity of these compounds (Table 3) was ranked as 1a~3a > 4a > 2a > 5a > 7a > 6a > 8a. Two most potent compounds (1a and 3a) displayed the same activity with pIC50 value of 4.70. These two compounds had propanoic acid (three carbon atoms) linker between the indole ring A and 2-amino group (as imide ring for compound 1a, as aminothiopyridine substituted by 3-NO2 group for compound 3a). For 2,4-dinitrobenzene analog 4a, it showed lower activity (pIC50= 4.40) when compared with nitropyridine 3a. Compound 2a (amino group as fused ring of imide) displayed a lower activity (pIC50= 4.30) when compared with the imide ring (1a). With the longer linker (four carbon atoms), compound 5a containing 2-aminobutanoic acid side chain was shown to be less active (pIC50= 4.10) when compared to dinitrophenyl compound 4a. When N atom of the indole ring was replaced by S atom, benzothiophene analog (ring A) 8a was obtained with the lowest activity (pIC50 = 3.64). Aza-indoles 6a and 7a (derived from replacing C atoms in the indole ring by one N and two N atoms, respectively) showed lower activity than the others (1a-5a), but higher than the thioindole (8a).     To achieve compounds with more improved activity, core structure or scaffold of the compound and its functional substituents could be modified. For scaffold A (Figures 1  and 7), compounds 1a-5a were structurally modified in which the indole ring A is either conserved or changed to pyrrole ring as well as the 2-amino moiety was substituted by Z group (NO2, NH2, OH, SH, OCH3, CH3, CF3 and F). The results (Supplementary Figure 1 and Supplementary Table 3) of indole ring and substitution at imide ring by the corresponding Z groups provided derivatives 1a1-1a8. When the indole ring A was changed to pyrrole ring and imide ring was substituted by Z groups, the modified compounds 1a9-1a16 were achieved. In 1a series, most of the compounds showed less potent activity than the parent compound. Pyrrole derivatives with Z= NO2 (1a9) and NH2 (1a10) displayed the most improved activity with comparable activity (pIC50 = 4.88 and 4.86, respectively) when compared with the indole series (1a1, pIC50 = 4.79, Z = NO2 and 1a2, pIC50 = 4.77, Z = NH2). This indicated that NO2 group (Z) is the most effective substituent on the imide ring of both indole (1a1) and pyrrole (1a9) analogs.
Similar results were noted for modified series 2a (2a1-2a16), most compounds showed the improved activity, except for compound 2a6. Both indole (2a1) and pyrrole (2a9) bearing Z = NO2 displayed the most improved activity (pIC50 = 4.82 and 4.98, respectively). This indicated that the pyrrole exerted higher activity than the indole.
In modified compounds 5a (5a1 to 5a15), most compounds showed the improved activity, except for 5a5. Pyrrole compound (5a9) was the most improved one with pIC50 = 4.43 whereas the parent compound 5a showed the pIC50 of 4.10.
In case of compounds 3a and 4a, all modified compounds (3a1-3a16 and 4a1-4a15) displayed the improved effect. It was shown that aminothiopyridine 3a11 was the most potent modified compound (pIC50 = 5.09), which is the pyrrole analog bearing 2-amino moiety substituted by SH (Z) group. In addition, the most improved modified 4a9 (pIC50 = 4.81) was pyrrole derivative containing 2amino moiety substituted by OH (Z) group. Notably, the structurally modification of fused indole ring (1a-5a) provided the improved compounds as the single ring (pyrrole) compounds.
Compound 7a (as bis-triazole condensed ring, pIC50 = 4.00) was transformed to pyrrole (7a1) and indole (7a2) analogs with the remaining one triazole ring. The improved effect was noted for the indole analog 7a2 (pIC50 = 4.10). Modified compounds (Supplementary Figure 1) in series 1a-5a and 7a, mostly showed the improved activity (Supplementary Table 3 compared with its parent compound (3a) as well as with other modified compounds in scaffold A. The highest neighborhood symmetry (BIC1) of compound 3a11 could be resulted from the single pyrrole (ring A) and a smaller group (Z = SH) on the thiopyridine moiety, which make the molecule even more symmetry than the indole ring A bearing the larger group (Z = NO2) of compound 3a. Structural features of 3a and 3a11 ( Figure 6) showed that these compounds had the same value of F06[N-O] = 4 representing by 6 bonds, which are a part of linkage between N atom (indole/pyrrole rings) and O atom of carboxylic moiety, whereas such property was not seen in the thioindole 8a ( Figure 8A).
The QSAR study (Equation 2) showed that the two most potent compounds 1b and 2b displayed significant descriptors (Supplementary Table 2 It should be noted that the oxazole 2b had the lowest van der Waals volume (RDF045v = 4.693) and low value of electronegativity (R8e = 0.437) when compared with the oxazole 3b having the highest van der Waals volume (RDF045v = 5.677) and high electronegativity (R8e = 0.516). The highest RDF045v value of compound 3b could be due to the presence of CH2 group (n = 1, Figure 8B) linking between the oxazole and nitro phenyl rings whereas the compound without CH2 group had the lowest van der Waals volume as noted for the most potent compound 2b. The improved activity of scaffold B compounds was performed ( Supplementary Figure 2 and Supplementary Table 4) as followed.
Notably, the most improved compound 2b8 displayed the lower values of R8e = 0.421 and RDF045v = 4.547, but higher value of R6e+ = 0.035 as compared to the parent compound 2b (R8e = 0.437, RDF045v = 4.693 and R6e+ = 0.031). The lower van der Waals volume (RDF045v) and lower electronegativity (R8e) of the most improved compound 2b8 (Figure 6) could be due to the smaller size and less electronegativity of the NH2 (R) group comparing with the nitro group of its parent 2b.

Chemical space of the studied compounds
Not only potent bioactivity but also druglike properties of the compounds are essential for successful drug development. Chemical space exploration is a method to investigate the drug-likeness of the compounds in which the Lipinski 'rule of five' is used as a guideline to determine drug-like properties (Reymond and Awale, 2012). The chemical space plots of the inhibitors and the related discussion are provided in Supplementary Information ( Supplementary Figures 3-4). It was demonstrated that most of the investigated compounds were distributed within a space of the Lipinski 'rule of five', which indicate their potential to be further developed as drugs.

CONCLUSION
Current attention has been given to the epigenetic targets due to their modifiable nature throughout lifetime. Among these targets, DNMT1 is a promising target which plays roles in many diseases. In this study, QSAR modeling of indole-based and oxazoline/oxazole-based DNMT1 inhibitors was performed along with in-depth SAR analysis. Two models were successfully constructed providing good predictive performance. A set of key structural features influencing the DNMT1 inhibitory effect of the compounds were revealed such as bond information, frequency of [N-O], electronegativity, van der Waals volume and topological distance. To increase structural diversity, the QSAR findings were further applied as a guide for in silico structural modification to design a set of 153 novel inhibitors and their activities were predicted. Finally, a set of promising newly designed inhibitors were highlighted to be further developed as potential DNMT1 inhibitors for therapeutics. In summary, this study demonstrates the facilitating role of QSAR modeling toward effective drug development in terms of rational design, screening and structural optimization. However, further synthesis of promising modified inhibitors and experimental studies are required to confirm DNMT1 inhibitory activity.

Supplementary information
Supplementary information is available on the EXCLI Journal website.